In our final project, we uses data of Broadway weekly grosses. The data comes from Playbill. Weekly box office grosses comprise data on revenue and attendance figures for theatres that are part of The Broadway League. Besides, the data comprises synopses of each show. Through the data, we make some analysis around Broadway gross, synopses and theatres.

The datasets are from Kaggle and FRED.

## Warning: package 'tm' was built under R version 3.6.3
## Warning: package 'quanteda' was built under R version 3.6.3
## Warning: package 'tidytext' was built under R version 3.6.3
## Warning: package 'ggthemes' was built under R version 3.6.2
## Warning: package 'maps' was built under R version 3.6.3
## Warning: package 'wordcloud' was built under R version 3.6.3

1. How does gross change over years?

Gross and Average Price by year

The bar chart in the graph shows the total gross of broadway shows each year from 1985 to 2019 and the line chart shows the pattern of average price of the mshow over these years. It is evident that both gross and average price increase through 35 years. The average price in 2019 is more than twice of the average price in 1985. However, we cannot tell that the growth of gross comes from only the increase of average price.

Seats

Seats sold by year is also a main factor that influence the gross in broadway. The bar chart below shows the pattern of total seats sold in each year from 1985 to 2019.It indicates that the total seats increase through the years but not as much as average price do.

2. What affect the total seats sold in Broadway?

Total seats and Per CPI of New York

What affect the total seats sold in Broadway? This part is aimed to figure out if the income affect the buying for the broadway show.

X - Per capita personal income(CPI) y - total seats per year
Since the price has been raising through the years, the total seats sold per year can be a proper factor to measure the buying.

The Scatter plot and the fit line shows that the total seat has a positive relationship with the per capita personal income.

3. Is there any seasonal pattern?

According to the bar chart, the seasonal pattern is not significant no matter the year.

## Warning in scan(file = file, what = what, sep = sep, quote = quote, dec =
## dec, : EOF within quoted string

4. What kind of words are repeated most often in the synopses?

From this part on, we will use text visualization to deeply analyze the synopses of all broadway shows. Firstly, let’s see the words which are used most often in all synopses. Top three are ‘musical’, ‘broadway’ and ‘new’, followed by ‘music’, ‘life’, ‘love’, ‘man’, ‘winner’, ‘songs’ and so on.

5. Positive tone VS negative tone

In this part, we use the Hu & Liu dictionary to calculate each words’ positive and negative score. Then, words are divided into two parts - positive and negative words based on their scores. Again, we use a wordcloud to show the differences between these two parts.

Obviously, positive words are more than negative words, as we could imagine, each show would use more positive words to describe their plot. Besides, we could find that Hu & Liu dictionary includes many neutral words into the two parts, which makes their differences less obvious.

6. Words appeared in the synopses of 1980s shows VS 2010s shows

How do the words appeared in the synopses of 1980s shows and 2010s shows differ in frequency? We select the synopses of shows from 1985 to 1988, and synopses of shows from 2017 to 2020, then compare the words used in these two groups to see if as time goes by, people’s usage of words would change greatly.

From the result we could see interestingly, 2010s shows tend to use words they share in common more than that of 1980s shows. In other words, 2010s shows are inclined to repeat the same words more than 1980s shows.

## Warning: Column `show` joining factors with different levels, coercing to
## character vector
## Warning in year == c(1985, 1986, 1987, 1988, 2017, 2018, 2019, 2020): 长的
## 对象长度不是短的对象长度的整倍数

7. Top 10 theatres in Broadway

## Selecting by seat_mean
## Selecting by seat_mean
## Selecting by seat_mean
## Assuming "lon" and "lat" are longitude and latitude, respectively
## Assuming "lon" and "lat" are longitude and latitude, respectively
## Assuming "lon" and "lat" are longitude and latitude, respectively

We can find the top 10 theatres in Broadway almost haven’t changed from 1990 to 2020 and the weekly gross continued to increase with time.